AITopics | smile string

Collaborating Authors

smile string

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

De novo Drug Design using Reinforcement Learning with Multiple GPTAgents

Neural Information Processing SystemsApr-25-2026, 07:58:27 GMT

De novo drug design is a pivotal issue in pharmacology and a new area of focus in AI for science research. A central challenge in this field is to generate molecules with specific properties while also producing a wide range of diverse candidates. Although advanced technologies such as transformer models and reinforcement learning have been applied in drug design, their potential has not been fully realized. Therefore, we propose MolRL-MGPT, a reinforcement learning algorithm with multiple GPT agents for drug molecular generation. To promote molecular diversity, we encourage the agents to collaborate in searching for desirable molecules in diverse directions. Our algorithm has shown promising results on the GuacaMol benchmark and exhibits efficacy in designing inhibitors against SARS-CoV-2 protein targets. The codes are available at: https://github.com/HXYfighter/

Add feedback

comprehensive benchmark on eight tasks

Neural Information Processing SystemsFeb-16-2026, 19:28:53 GMT

Large Language Models (LLMs) with strong abilities in natural language processing tasks have emerged and have been applied in various kinds of areas such as science, finance and software engineering.

large language model, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Country:

North America > United States (0.93)
North America > Dominican Republic (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Pakistan (0.04)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study > Negative Result (0.46)

Industry:

Law (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Government > Regional Government > North America Government > United States Government (0.68)
(2 more...)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.80)

Add feedback

1 Datasheet for QM1B

Neural Information Processing SystemsFeb-16-2026, 12:09:50 GMT

As recommended by the NeurIPS dataset and benchmark track, we documented QM1B and intended uses through the Datasheets for Datasets framework [1]. The goal of dataset datasheets as outlined by [1] is to provide a standardized process for documentating datasets. The authors of [1] present a list of carefully selected questions which dataset authors should answer. We hope our answers to these questions will facilitate better communication between us (the dataset creators) and future users of QM1B. For what purpose was the dataset created? Prior gaussian-based Density Functional Theory (DFT) datasets contained fewer than 20 million training examples.

artificial intelligence, inductive learning, machine learning, (19 more...)

Neural Information Processing Systems

Industry: Law (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.57)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.50)

Add feedback

Predicting Polymer Solubility in Solvents Using SMILES Strings

Reinhard, Andrew

arXiv.org Artificial IntelligenceDec-11-2025

Understanding and predicting polymer solubility in various solvents is critical for applications ranging from recycling to pharmaceutical formulation. This work presents a deep learning framework that predicts polymer solubility, expressed as weight percent (wt%), directly from SMILES representations of both polymers and solvents. A dataset of 8,049 polymer solvent pairs at 25 deg C was constructed from calibrated molecular dynamics simulations (Zhou et al., 2023), and molecular descriptors and fingerprints were combined into a 2,394 feature representation per sample. A fully connected neural network with six hidden layers was trained using the Adam optimizer and evaluated using mean squared error loss, achieving strong agreement between predicted and actual solubility values. Generalizability was demonstrated using experimentally measured data from the Materials Genome Project, where the model maintained high accuracy on 25 unseen polymer solvent combinations. These findings highlight the viability of SMILES based machine learning models for scalable solubility prediction and high-throughput solvent screening, supporting applications in green chemistry, polymer processing, and materials design.

artificial intelligence, machine learning, polymer, (17 more...)

arXiv.org Artificial Intelligence

2512.09784

Genre: Research Report (0.82)

Industry: Materials > Chemicals > Commodity Chemicals > Petrochemicals > Polymers & Plastics (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Unveiling Latent Knowledge in Chemistry Language Models through Sparse Autoencoders

Cohen, Jaron, Hasson, Alexander G., Tanovic, Sara

arXiv.org Artificial IntelligenceDec-10-2025

Since the advent of machine learning, interpretability has remained a persistent challenge, becoming increasingly urgent as generative models support high-stakes applications in drug and material discovery. Recent advances in large language model (LLM) architectures have yielded chemistry language models (CLMs) with impressive capabilities in molecular property prediction and molecular generation. However, how these models internally represent chemical knowledge remains poorly understood. In this work, we extend sparse autoencoder techniques to uncover and examine interpretable features within CLMs. Applying our methodology to the Foundation Models for Materials (FM4M) SMI-TED chemistry foundation model, we extract semantically meaningful latent features and analyse their activation patterns across diverse molecular datasets. Our findings reveal that these models encode a rich landscape of chemical concepts. We identify correlations between specific latent features and distinct domains of chemical knowledge, including structural motifs, physicochemical properties, and pharmacological drug classes. Our approach provides a generalisable framework for uncovering latent knowledge in chemistry-focused AI systems. This work has implications for both foundational understanding and practical deployment; with the potential to accelerate computational chemistry research.

artificial intelligence, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2512.08077

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.86)

Add feedback

Look the Other Way: Designing 'Positive' Molecules with Negative Data via Task Arithmetic

Özçelik, Rıza, de Ruiter, Sarah, Grisoni, Francesca

arXiv.org Artificial IntelligenceDec-9-2025

The scarcity of molecules with desirable properties (i.e., `positive' molecules) is an inherent bottleneck for generative molecule design. To sidestep such obstacle, here we propose molecular task arithmetic: training a model on diverse and abundant negative examples to learn 'property directions' - without accessing any positively labeled data - and moving models in the opposite property directions to generate positive molecules. When analyzed on 33 design experiments with distinct molecular entities (small molecules, proteins), model architectures, and scales, molecular task arithmetic generated more diverse and successful designs than models trained on positive molecules in general. Moreover, we employed molecular task arithmetic in dual-objective and few-shot design tasks. We find that molecular task arithmetic can consistently increase the diversity of designs while maintaining desirable complex design properties, such as good docking scores to a protein. With its simplicity, data efficiency, and performance, molecular task arithmetic bears the potential to become the de facto transfer learning strategy for de novo molecule design.

artificial intelligence, machine learning, molecule, (17 more...)

arXiv.org Artificial Intelligence

2507.17876

Country: Europe > Netherlands (0.14)

Genre: Research Report > New Finding (0.93)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

Universally Converging Representations of Matter Across Scientific Foundation Models

Edamadaka, Sathya, Yang, Soojung, Li, Ju, Gómez-Bombarelli, Rafael

arXiv.org Artificial IntelligenceDec-4-2025

Machine learning models of vastly different modalities and architectures are being trained to predict the behavior of molecules, materials, and proteins. However, it remains unclear whether they learn similar internal representations of matter. Understanding their latent structure is essential for building scientific foundation models that generalize reliably beyond their training domains. Although representational convergence has been observed in language and vision, its counterpart in the sciences has not been systematically explored. Here, we show that representations learned by nearly sixty scientific models, spanning string-, graph-, 3D atomistic, and protein-based modalities, are highly aligned across a wide range of chemical systems. Models trained on different datasets have highly similar representations of small molecules, and machine learning interatomic potentials converge in representation space as they improve in performance, suggesting that foundation models learn a common underlying representation of physical reality. We then show two distinct regimes of scientific models: on inputs similar to those seen during training, high-performing models align closely and weak models diverge into local sub-optima in representation space; on vastly different structures from those seen during training, nearly all models collapse onto a low-information representation, indicating that today's models remain limited by training data and inductive bias and do not yet encode truly universal structure. Our findings establish representational alignment as a quantitative benchmark for foundation-level generality in scientific models. More broadly, our work can track the emergence of universal representations of matter as models scale, and for selecting and distilling models whose learned representations transfer best across modalities, domains of matter, and scientific tasks.

artificial intelligence, machine learning, representation, (19 more...)

arXiv.org Artificial Intelligence

2512.0375

Country: North America > United States > Massachusetts (0.14)

Genre: Research Report > New Finding (0.48)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)

Add feedback

Pharmacophore-based design by learning on voxel grids

Mahmood, Omar, Pinheiro, Pedro O., Bonneau, Richard, Saremi, Saeed, Sresht, Vishnu

arXiv.org Artificial IntelligenceDec-3-2025

Ligand-based drug discovery (LBDD) relies on making use of known binders to a protein target to find structurally diverse molecules similarly likely to bind. This process typically involves a brute force search of the known binder (query) against a molecular library using some metric of molecular similarity. One popular approach overlays the pharmacophore-shape profile of the known binder to 3D conformations enumerated for each of the library molecules, computes overlaps, and picks a set of diverse library molecules with high overlaps. While this virtual screening workflow has had considerable success in hit diversification, scaffold hopping, and patent busting, it scales poorly with library sizes and restricts candidate generation to existing library compounds. Leveraging recent advances in voxel-based generative modelling, we propose a pharmacophore-based generative model and workflows that address the scaling and fecundity issues of conventional pharmacophore-based virtual screening. We introduce \emph{VoxCap}, a voxel captioning method for generating SMILES strings from voxelised molecular representations. We propose two workflows as practical use cases as well as benchmarks for pharmacophore-based generation: \emph{de-novo} design, in which we aim to generate new molecules with high pharmacophore-shape similarities to query molecules, and fast search, which aims to combine generative design with a cheap 2D substructure similarity search for efficient hit identification. Our results show that VoxCap significantly outperforms previous methods in generating diverse \textit{de-novo} hits. When combined with our fast search workflow, VoxCap reduces computational time by orders of magnitude while returning hits for all query molecules, enabling the search of large libraries that are intractable to search by brute force.

artificial intelligence, machine learning, molecule, (18 more...)

arXiv.org Artificial Intelligence

2512.02031

Genre:

Workflow (0.96)
Research Report > New Finding (0.54)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ChemFixer: Correcting Invalid Molecules to Unlock Previously Unseen Chemical Space

Park, Jun-Hyoung, Song, Ho-Jun, Lee, Seong-Whan

arXiv.org Artificial IntelligenceNov-19-2025

Deep learning-based molecular generation models have shown great potential in efficiently exploring vast chemical spaces by generating potential drug candidates with desired properties. However, these models often produce chemically invalid molecules, which limits the usable scope of the learned chemical space and poses significant challenges for practical applications. To address this issue, we propose ChemFixer, a framework designed to correct invalid molecules into valid ones. ChemFixer is built on a transformer architecture, pre-trained using masking techniques, and fine-tuned on a large-scale dataset of valid/invalid molecular pairs that we constructed. Through comprehensive evaluations across diverse generative models, ChemFixer improved molecular validity while effectively preserving the chemical and biological distributional properties of the original outputs. This indicates that ChemFixer can recover molecules that could not be previously generated, thereby expanding the diversity of potential drug candidates. Furthermore, ChemFixer was effectively applied to a drug-target interaction (DTI) prediction task using limited data, improving the validity of generated ligands and discovering promising ligand-protein pairs. These results suggest that ChemFixer is not only effective in data-limited scenarios, but also extensible to a wide range of downstream tasks. Taken together, ChemFixer shows promise as a practical tool for various stages of deep learning-based drug discovery, enhancing molecular validity and expanding accessible chemical space.

artificial intelligence, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/JBHI.2025.3593825

2511.13758

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Filters

Collaborating Authors

smile string

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

De novo Drug Design using Reinforcement Learning with Multiple GPTAgents

comprehensive benchmark on eight tasks

1 Datasheet for QM1B

Benchmark_Sample_Efficiency_neurips_data

Predicting Polymer Solubility in Solvents Using SMILES Strings

Unveiling Latent Knowledge in Chemistry Language Models through Sparse Autoencoders

Look the Other Way: Designing 'Positive' Molecules with Negative Data via Task Arithmetic

Universally Converging Representations of Matter Across Scientific Foundation Models

Pharmacophore-based design by learning on voxel grids

ChemFixer: Correcting Invalid Molecules to Unlock Previously Unseen Chemical Space